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ABSTRACT 



The purpose of this study was to examine the effect of 
completion instructions on item parameters and category use as students 
completed a self-report personality survey. Instructions allowed free-choice 
allotment of ratings (nonforced distribution) or requested the subject to 
assign a certain number of ratings to either the highest or lowest 
rating-scale categories (forced instructions) . A total of 126 participant 
responses using nonforced instructions was obtained from a self-report survey 
completed at 1 company. The comparison sample of 346 forced distributions of 
ratings was collected from 5 companies. The hypothesis that alternatively 
worded instructions for survey completion would not impact item parameters 
was unsupported. The two sets of instructions produced nonequivalent patterns 
of response and item statistics. In addition, person separation and 
reliability were lower in the forced distribution condition. Data collected 
under the forced conditions did not fit the Rasch model as well as did the 
nonforced distribution data. Five appendixes contain the instructions for 
both conditions, two figures, and three tables of data. ( SLD) 
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The susceptibility of item parameters to instructions for completion. 

Introduction 

“A statement about an empirical system is meaningful only when it is scale independent, 
that is, only when it is true on all of the permissible numeric scales”. 

-Townsend and Ashby (1984, pp. 399) 

According to Wright and Stone (1979), one of the characteristics of Rasch-based 
measurement is that the difficulty of an item is independent of a person’s degree of endorsement 
of it. The item measure is said to be invariant across groups and individuals. In other words, the 
item is difficult relative to other items and not related to the magnitude of a person’s response to 
it. As well, the theory infers that the item difficulty will remain the same no matter what scale we 
use to measure it by (Englehard, 1992). 

However, few studies have been done to assess the impact of instructions on results of 
Rasch analyses. There is a dearth of evidence in the literature on equivalence of item parameters 
in studies involving manipulations of response formats and instruction sets. 

The purpose of the present investigation was to examine the effect of completion 
instructions on item parameters and category usage. Instructions a) allowed free-choice allotment 
of ratings (non-forced instructions) and b) requested the subject to assign a certain number of 
ratings to either the highest or lowest rating-scale categories (forced instructions). It was 
hypothesized that: 

Hi: There would not be significant effects of instructions on the item calibrations. 

H 2 : There would be no effects of instructions on the person parameter separation or 

reliability. 



Method 



Subjects. 

A total of 126 participants using the non-forced distributions (NFD) were obtained from a 
self-report survey completed in 1996 in one company. The comparison sample of 346 forced 
distributions (FD) of ratings was collected from 5 companies between 1996 and 1998. Cases 
were deleted for recording errors and missing responses. Samples of equivalent size (n = 115), 
that contained complete sets of data on the 6-item Group Involvement scale, were selected from 
the two instructional-set groups for purposes of comparison. 

Procedure 

Copies of the two forms of the personality survey had been distributed and collected in 
previous investigations in companies across the United States. The 1996a version (NFD) had 
been administered to employees of a large utility company in 1996. The 1996b version (FD) had 
been used to survey employees in 5 large public and private corporations between 1996 and 
1998. Responses were returned directly to the researchers and individual responses were kept 
confidential. Data from the surveys were analyzed and compared across instruction groups using 
the WINSTEPS (Linacre, 1998) program for Rasch analysis. 

Instrument. 

The instrument used to collect these data was the Employee Profile® (Somerville and 
Company, Inc. 1996a, 1996b), a 144-item survey of personality characteristics relevant to work- 
related contexts. The 1996a version (NFD) contained directions to the respondents to rate 
themselves on a scale of 1 to 6 points according to how well the statements described their 
modes of behavior in work-related situations. In addition, the instructions requested that subjects 
distribute their ratings across the entire 6-point scale. See Appendix A for a sample item and its 
response format. 
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The 1996b version (FD) asked participants to rate themselves on a 10-point scale (1 to 
10). As well, they were to put at least 20 of the ratings into either of the two lowest categories (1 
and 2) and to not put more than 20 of the ratings into either of the two highest categories (9 and 
10). A sample item with this format is included in Appendix B. 

The Group Involvement scale consists of 6 items that describe work-related behaviors. 
Group involvement describes the propensity toward involving one’s self in team efforts and to 
publicly recognize and promote members’ contributions. See Appendix C for the items. 

Results 

It was hypothesized that item difficulties would differ across instruction groups. To test 
this hypothesis a Rasch model analysis was performed on each set of data. A t-value was 
computed to test for differences between groups on each item. The reported t-value was 
calculated according to Wright and Stone (pp. 95,1979) for the degree to which the sample 
difficulty measures approximate the same difficulty parameter. In Table 1 it may be observed that 
4 of the 6 parameter estimates differed significantly (p < .05). Also from Table 1, items 2 and 6 
exchanged rank order. The difference between the Item 2 and Item 6 parameters was 0.24 logits 
(t = 1 .41 , n.s.) in the NFD sample and 0.06 logits (t = .004, n.s.) in the FD sample. 



Table 1. 

Comparison of item parameter estimates across instructional sets. 



Item 


Set 


Measure 


Error 


t- 

value 


Rank 


1 


NFD 


-.45 


.13 


-2.51 


5 




FD 


-.10 


.05 




5 


2 


NFD 


.31 


.12 


2.15 


2 




FD 


.03 


.05 




4 


3 


NFD 


.54 


.11 


3.08 


1 




FD 


.18 


.04 




1 


4 


NFD 


.21 


.12 


1.03 


3 




FD 


.08 


.04 




3 


5 


NFD 


-.69 


.13 


-2.86 


6 




FD 


-.28 


.06 




6 




NFD 


.07 


.12 




4 


6 


FD 


.09 


.04 


0.16 


2 



Apparently, the 6 items represented an underlying, unidimensional construct. None of the 
standardized fit statistics were greater than 2.0. However, it was interesting to note that the point 
biserial correlations of the items with the full scale were consistently higher in the NFD sample. 
Correlations for each sample significantly differed on all but Items 3 and 5 (two-tail test, p < .05). 

Examination of results presented in Table 3 revealed that the person ability parameter 
was measured more reliably using the NFD rather than the FD instruction set. Real and model 
reliability estimates significantly differed between the groups (two-tail test, p < .05). The NFD 
sample displayed greater separation between persons than did the FD sample. 
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Table 2. 

Comparison of item fit statistics across instructional groups. 



Item 


Set 


Inf it 
MNSQ 


z 

MNSQ 


Outfit 

MNSQ 


z 

MNSQ 


Point 

biserial 


1 


NFD 


.87 


-.9 


.87 


-1.0 


.52 




FD 


.95 


-.4 


.93 


-.4 


.24 


2 


NFD 


.99 


-.1 


1.05 


.4 


.49 




FD 


1.06 


.5 


1.07 


.5 


.22 


3 


NFD 


1.03 


.2 


.98 


-.1 


.50 




FD 


.95 


-.5 


.88 


-1.0 


.30 


4 


NFD 


1.04 


.3 


1.01 


.1 


.51 




FD 


.88 


-1.1 


.90 


-.8 


.24 


5 


NFD 


1.04 


.3 


1.05 


.3 


.46 




FD 


1.04 


.2 


.97 


-.2 


.32 


6 


NFD 


.95 


-.4 


.97 


-.2 


.47 




FD 


1.17 


1.5 


1.22 


1.6 


.04 



Table 3. 

Summary of measured persons. 







RSME 


Adjusted 
Std. Dev. 


Separation 


Reliability 


NFD 


Real 


.60 


1.00 


1.68 


.74 




Model 


.53 


1.04 


1.95 


.79 


FD 


Real 


.26 


.27 


1.06 


.53 




Model 


.23 


.30 


1.29 


.62 



As has been seen in Table 1 , the item measures had greater variability in the NFD than 
the FD condition (Table 4). As a result, there was a wider range of item separation under the NFD 
set of instructions. 



Table 4. 

Summary of measured items. 

RSME 


Adjusted 
Std. Dev. 


Separation 


Reliability 


NFD 


Real 


.12 


.41 


3.40 


.92 




Model 


.12 


.41 


3.43 


.92 


FD 


Real 


.05 


.14 


2.93 


.90 




Model 


.05 


.14 


3.0 


.90 
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Discussion 

The principle of invariance of item parameters suggests that the scales used to measure 
attributes should not affect the difficulty or relative order of the same items measured two different 
ways. The sample values of the item difficulty measures are expected to differ somewhat but the 
order and separation of the items would be expected to remain constant across variations of 
rating scale. 

However, the hypothesis, that alternatively worded instructions for survey completion 
would not impact item parameters, was unsupported. Asking participants to use the full range of 
their response options versus asking them to assign a proportion of their responses to the 
extreme categories produced non-equivalent patterns of response and item statistics. In addition, 
person separation and reliability were lower in the FD condition. It may be concluded that the data 
collected under the FD instructions did not fit the Rasch model as well as did the NFD data. 

It should be noted that the efficiency of the rating scales differed between the samples, 
as well. The utilization of the scale categories was not the same across instruction sets and rating 
formats. See Appendices D and E. It may be that imposing limits on responses in certain 
categories, as was the case with the FD instructions, on a rating scale provoked categorical 
responses like “this is one of my 20 best/worst characteristics - yes or no". 

As well, Andrich (1996) and van der Linden (1993) have described conditions under 
which the number of categories on rating scales impacted the fit of the data to the model. 
However, collapsing the categories may not accurately reflect the true responses or abilities of 
the persons on the items. Further investigations should include the reduced scale formats for 
empirical conformation of category utilization. 

The contribution of the present investigation to knowledge in the field of measurement 
and survey design is twofold. As Townsend and Ashby (1984) pointed out, studies of alternative 
rating methods are needed before we can assume findings generalize to all applications. The 
results serve as a cautionary note to survey designers and consumers. In this case, forcing 
categorical decisions onto a rating scale along with decisions that were to be made on a 
continuum created rather messy distributions. And it appeared to have an impact on statistics 
derived from the sample obtained under these conditions. Certainty in conclusions based on 
sample data may be compromised by violations of the assumption that item difficulty remains 
invariant across types of instructions and rating formats. 

And the results were informative in that two methods of distributing self-report ratings, 
one of which appears to have remained unreported in the literature, were analyzed and compared 
using Rasch model analysis. Survey researchers face tremendous odds against finding high- 
precision methods of measurement in surveys of characteristics, attitudes, and behaviors. Rasch 
modeling provides survey researchers a useful avenue of investigation and basis for comparison 
of alternative techniques. Reports of investigations into the impact of instructional sets on 
responses help to direct the search for more accurate ways to measure people. 
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APPENDIX A 

Non-forced distribution instructions to respondents on the Employee Profile. 



The Employee Profile Survey was designed to assist in identifying an individual’s most 
and least prominent work behaviors and characteristics. Your ratings should be based on how 
frequently these behaviors occur or how characteristic that behavior is of you. 

Each person should exhibit some traits that are obvious to those around them. The 
behaviors demonstrated most frequently should be rated 5 or 6. This will identify your most 
prominent behavioral features. 

It would be likely that a person would not demonstrate some of these behaviors very 
often. Rate some of the items 1 or 2 to identify your least prominent behavioral characteristics. 

Respond to the remaining items with 3 or 4 ratings depending upon how often or how 
characteristic you think these behaviors are of you. 



Almost 


Seldom or 


Regularly but 


Fairly 


Very 


Almost 


Never 


Once in awhile 


Not often, 


Often 


Frequently 


Always 


<X> 


0 


0 




0 


0 


Not at All 


Slightly 


Moderately 


Characteristic 


Very 


Extremely 


Characteristic 


Characteristic 




Characteristic 





1 A. Shows a sincere interest in suggestions from members of the work group. 



APPENDIX B 

Forced distribution instructions to respondents on the Employee Profile. 

Read each item and determine how characteristic of you the behavior associated with the 
item is using the “1" to “10" scale. In order to ensure that your most and least characteristic traits 
are identified, rate at least twenty (20) items “1" or “2", and no more than twenty (20) items at “9 
or 10." You may find that putting 5 items in the “1” or “2" and 5 in the “9” or “10” ranges on EACH 
page is easiest way to accomplish this. Rate the remaining items within the range “3” and “8" 
depending upon your assessment of how characteristic of you each item is. 



Less Characteristic 


©@ 


®©®©@® 


®® 


More Characteristic 



1 B Shows a sincere interest in suggestions from members of the work group 



©( 2 ) ®©®©@® @® 
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APPENDIX C 

Items on the Group Involvement Scale. 

ITEM STATEMENT 

1 Shows a sincere interest in suggestions from members of the work group. 

P Seeks out the special talents and abilities of others to contribute to the quality of the team’s 
product. 

3 Encourages people to speak up even when their opinions differ from the opinions of the majority. 

4 Develops a feeling of unity and sharing among co-workers. 

5 Publicly shares credit for success with those who contributed. 

6 Actively promotes the involvement of people having a stake in the outcome of the project or task. 
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APPENDIX D 



Non-forced Distribution Instructions (NFD) 



Category probabilities. 



p 

R 

0 

B 

A 

B 

I 

L 

I 

T 

Y 

0 

F 

R 

E 

S 

P 

0 

N 

S 

E 



1.0 + 



_ + + ++ 

+ 



II 

I HI 

.8 + 1 



66 j 



66 



66 



11 



66 



t 11 4444 6 

.6 + 1 44 44 6 

f 1 222222 4 4 6 

.5 + 122 24 46 

j 21 24 44 55 6 

. 4 + 22 1 2 4 5*5 *55 

| 2 1 24 55 4 6 55 

| 22 1 4* 55 * 55 

| 2 11 33*3 *333 5 64 5 

.2 + 22 *3 4 2 33 55 6 4 55 

| 222 33 14 22 ** 6 44 



+ 

I 

+ 
1 
I 
I 

+ 

555 | 



12 



33 4411 ** 333 66 44 55 | 

| 33333 444 111*555 22266**33 4444 | 

Q ^*4r** + + **4r****^******‘*‘ + + + , fc*^4 , + ****4'******4 , ***** + * + ^*4‘*'*’*’*’'*’ , *' + ‘*i'*.f. 

-H H H H H H 1 1 1 1 H + 



-3 



-2-10123 
PERSON [MINUS] ITEM MEASURE 



Summary of measured categories. 

Cate 9 or y Observations Step Calibration 



Step Std. 
Error 



Thurstone 

Threshold 



1 

2 

3 


7 

56 

82 


- 2.91 

-.71 


.39 

.15 


- 3.01 

- 1.20 


4 


324 


- 1 . 07 * 


.11 


-.52 


5 


143 


1.92 


.10 


1.69 


6 


72 


2.77 


.15 


3.05 



O 

ERIC 
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APPENDIX E 



Forced Distribution Instructions (FD) 



Category probabilities 



p 

R 

0 

B 

A 

B 

I 

L 

I 

T 

Y 

O 

F 

R 

E 

S 

P 

o 

N 

S 

E 



1.0 + 



|00 

. 8 + 00 



99 



99 



00 



99 



++ 
+ 
I 

99991 

I 

+ 
I 
l 



00 



111111 



.6 + 



.5 + 



11 



I 

. 4 + 

I 
I 
I 

. 2 + 11 
111 
I 
I 



00 1 
*1 

1 00 
11 0 



1 7777 9 

1 77 77 9 

10 17 79 

11 00 17 8**888 

11 01 *666 889 77 888 

00 * 6*8 9 7 888 

00 **55588 6* 77 888 | 

22***4*885599 66 77 8888 | 

222222** ******* ***555 6666 7777 8| 

< Q .J- *•*••*••*• * ****** *********•*••*•*•** + + + + 

- 4 - 3 - 2-101234 
PERSON [MINUS] ITEM MEASURE 



Summary of measured categories. 

Cate 9 or * Observations Step Calibration 


Step Std. 
Error 


Thurstone 

Threshold 


1 


14 








2 


102 


-2.25 


.27 


-2.29 


3 


25 


1.20 


.11 


-.43 


4 


21 


.03* 


.11 


-.35 


5 


48 


-.91* 


.10 


-.30 


6 


85 


-.58 


.09 


-.21 


7 


137 


-.40 


.09 


-.06 


8 


185 


-.12 


.09 


.27 


9 


52 


1.59 


.13 


1.26 


10 


21 


1.44* 


.23 


2.00 
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The susceptibility of item parameters to instructions for completion. 

Introduction 

“A statement about an empirical system is meaningful only when it is scale independent, 
that is, only when it is true on all of the permissible numeric scales”. 

-Townsend and Ashby (1984, pp. 399) 

According to Wright and Stone (1979), one of the characteristics of Rasch-based 
measurement is that the difficulty of an item is independent of a person’s degree of endorsement 
of it. The item measure is said to be invariant across groups and individuals. In other words, the 
item is difficult relative to other items and not related to the magnitude of a person’s response to 
it. As well, the theory infers that the item difficulty will remain the same no matter what scale we 
use to measure it by (Englehard, 1992). 

However, few studies have been done to assess the impact of instructions on results of 
Rasch analyses. There is a dearth of evidence in the literature on equivalence of item parameters 
in studies involving manipulations of response formats and instruction sets. 

The purpose of the present investigation was to examine the effect of completion 
instructions on item parameters and category usage. Instructions a) allowed free-choice allotment 
of ratings (non-forced instructions) and b) requested the subject to assign a certain number of 
ratings to either the highest or lowest rating-scale categories (forced instructions). It was 
hypothesized that: 

Hi: There would not be significant effects of instructions on the item calibrations. 

H 2 : There would be no effects of instructions on the person parameter separation or 

reliability. 



Method 



Subjects. 

A total of 126 participants using the non-forced distributions (NFD) were obtained from a 
self-report survey completed in 1996 in one company. The comparison sample of 346 forced 
distributions (FD) of ratings was collected from 5 companies between 1996 and 1998. Cases 
were deleted for recording errors and missing responses. Samples of equivalent size (n = 1 1 5), 
that contained complete sets of data on the 6-item Group Involvement scale, were selected from 
the two instructional-set groups for purposes of comparison. 

Procedure 

Copies of the two forms of the personality survey had been distributed and collected in 
previous investigations in companies across the United States. The 1996a version (NFD) had 
been administered to employees of a large utility company in 1996. The 1996b version (FD) had 
been used to survey employees in 5 large public and private corporations between 1996 and 
1998. Responses were returned directly to the researchers and individual responses were kept 
confidential. Data from the surveys were analyzed and compared across instruction groups using 
the WINSTEPS (Linacre, 1998) program for Rasch analysis. 

Instrument. 

The instrument used to collect these data was the Employee Profile® (Somerville and 
Company, Inc. 1996a, 1996b), a 144-item survey of personality characteristics relevant to work- 
related contexts. The 1996a version (NFD) contained directions to the respondents to rate 
themselves on a scale of 1 to 6 points according to how well the statements described their 
modes of behavior in work-related situations. In addition, the instructions requested that subjects 
distribute their ratings across the entire 6-point scale. See Appendix A for a sample item and its 
response format. 
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The 1996b version (FD) asked participants to rate themselves on a 10-point scale (1 to 
1 0). As well, they were to put at least 20 of the ratings into either of the two lowest categories (1 
and 2) and to not put more than 20 of the ratings into either of the two highest categories (9 and 
10). A sample item with this format is included in Appendix B. 

The Group Involvement scale consists of 6 items that describe work-related behaviors. 
Group involvement describes the propensity toward involving one’s self in team efforts and to 
publicly recognize and promote members’ contributions. See Appendix C for the items. 

Results 

It was hypothesized that item difficulties would differ across instruction groups. To test 
this hypothesis a Rasch model analysis was performed on each set of data. A t-value was 
computed to test for differences between groups on each item. The reported t-value was 
calculated according to Wright and Stone (pp. 95,1979) for the degree to which the sample 
difficulty measures approximate the same difficulty parameter. In Table 1 it may be observed that 
4 of the 6 parameter estimates differed significantly (p < .05). Also from Table 1 , items 2 and 6 
exchanged rank order. The difference between the Item 2 and Item 6 parameters was 0.24 logits 
(f = 1 .41 , n.s.) in the NFD sample and 0.06 logits (t = .004, n.s.) in the FD sample. 



Table 1. 

Comparison of item parameter estimates across instructional sets. 



Item 


Set 


Measure 


Error 


t- 

value 


Rank 


1 


NFD 


-.45 


.13 


-2.51 


5 




FD 


-.10 


.05 




5 


2 


NFD 


.31 


.12 


2.15 


2 




FD 


.03 


.05 




4 


3 


NFD 


.54 


.11 


3.08 


1 




FD 


.18 


.04 




1 


4 


NFD 


.21 


.12 


1.03 


3 




FD 


.08 


.04 




3 


5 


NFD 


-.69 


.13 


-2.86 


6 




FD 


-.28 


.06 




6 




NFD 


.07 


.12 




4 


6 


FD 


.09 


.04 


0.16 


2 



Apparently, the 6 items represented an underlying, unidimensional construct. None of the 
standardized fit statistics were greater than 2.0. However, it was interesting to note that the point 
biserial correlations of the items with the full scale were consistently higher in the NFD sample. 
Correlations for each sample significantly differed on all but Items 3 and 5 (two-tail test, p < .05). 

Examination of results presented in Table 3 revealed that the person ability parameter 
was measured more reliably using the NFD rather than the FD instruction set. Real and model 
reliability estimates significantly differed between the groups (two-tail test, p < .05). The NFD 
sample displayed greater separation between persons than did the FD sample. 
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Table 2. 

Comparison of item fit statistics across instructional groups. 



Item 


Set 


Infit 

MNSQ 


z 

MNSQ 


Outfit 

MNSQ 


z 

MNSQ 


Point 

biserial 


1 


NFD 


.87 


-.9 


.87 


-1.0 


.52 




FD 


.95 


-.4 


.93 


-.4 


.24 


2 


NFD 


.99 


-.1 


1.05 


.4 


.49 




FD 


1.06 


.5 


1.07 


.5 


.22 


3 


NFD 


1.03 


.2 


.98 


-.1 


.50 




FD 


.95 


-.5 


.88 


-1.0 


.30 


4 


NFD 


1.04 


.3 


1.01 


.1 


.51 




FD 


.88 


-1.1 


.90 


-.8 


.24 


5 


NFD 


1.04 


.3 


1.05 


.3 


.46 




FD 


1.04 


.2 


.97 


-.2 


.32 


6 


NFD 


.95 


-.4 


.97 


-.2 


.47 




FD 


1.17 


1.5 


1.22 


1.6 


.04 



Table 3. 

Summary of measured persons. 





RSME 


Adjusted 
Std. Dev. 


Separation 


Reliability 


NFD Real 


.60 


1.00 


1.68 


.74 


Model 


.53 


1.04 


1.95 


.79 


FD Real 


.26 


.27 


1.06 


.53 


Model 


.23 


.30 


1.29 


.62 


As has been seen in Table 1, the item measures had greater variability in the NFD than 
the FD condition (Table 4). As a result, there was a wider range of item separation under the NFD 
set of instructions. 


Table 4. 

Summary of measured items. 


RSME 


Adjusted 
Std. Dev. 


Separation 


Reliability 


NFD Real 


.12 


.41 


3.40 


.92 


Model 


.12 


.41 


3.43 


.92 


FD Real 


.05 


.14 


2.93 


.90 


Model 


.05 


.14 


3.0 


.90 
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Discussion 

The principle of invariance of item parameters suggests that the scales used to measure 
attributes should not affect the difficulty or relative order of the same items measured two different 
ways. The sample values of the item difficulty measures are expected to differ somewhat but the 
order and separation of the items would be expected to remain constant across variations of 
rating scale. 

However, the hypothesis, that alternatively worded instructions for survey completion 
would not impact item parameters, was unsupported. Asking participants to use the full range of 
their response options versus asking them to assign a proportion of their responses to the 
extreme categories produced non-equivalent patterns of response and item statistics. In addition, 
person separation and reliability were lower in the FD condition. It may be concluded that the data 
collected under the FD instructions did not fit the Rasch model as well as did the NFD data. 

It should be noted that the efficiency of the rating scales differed between the samples, 
as well. The utilization of the scale categories was not the same across instruction sets and rating 
formats. See Appendices D and E. It may be that imposing limits on responses in certain 
categories, as was the case with the FD instructions, on a rating scale provoked categorical 
responses like “this is one of my 20 best/worst characteristics - yes or no”. 

As well, Andrich (1996) and van der Linden (1993) have described conditions under 
which the number of categories on rating scales impacted the fit of the data to the model. 
However, collapsing the categories may not accurately reflect the true responses or abilities of 
the persons on the items. Further investigations should include the reduced scale formats for 
empirical conformation of category utilization. 

The contribution of the present investigation to knowledge in the field of measurement 
and survey design is twofold. As Townsend and Ashby (1984) pointed out, studies of alternative 
rating methods are needed before we can assume findings generalize to all applications. The 
results serve as a cautionary note to survey designers and consumers. In this case, forcing 
categorical decisions onto a rating scale along with decisions that were to be made on a 
continuum created rather messy distributions. And it appeared to have an impact on statistics 
derived from the sample obtained under these conditions. Certainty in conclusions based on 
sample data may be compromised by violations of the assumption that item difficulty remains 
invariant across types of instructions and rating formats. 

And the results were informative in that two methods of distributing self-report ratings, 
one of which appears to have remained unreported in the literature, were analyzed and compared 
using Rasch model analysis. Survey researchers face tremendous odds against finding high- 
precision methods of measurement in surveys of characteristics, attitudes, and behaviors. Rasch 
modeling provides survey researchers a useful avenue of investigation and basis for comparison 
of alternative techniques. Reports of investigations into the impact of instructional sets on 
responses help to direct the search for more accurate ways to measure people. 
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APPENDIX A 

Non-forced distribution instructions to respondents on the Employee Profile. 



The Employee Profile Survey was designed to assist in identifying an individual’s most 
and least prominent work behaviors and characteristics. Your ratings should be based on how 
frequently these behaviors occur or how characteristic that behavior is of you. 

Each person should exhibit some traits that are obvious to those around them. The 
behaviors demonstrated most frequently should be rated 5 or 6. This will identify your most 
prominent behavioral features. 

It would be likely that a person would not demonstrate some of these behaviors very 
often. Rate some of the items 1 or 2 to identify your least prominent behavioral characteristics. 

Respond to the remaining items with 3 or 4 ratings depending upon how often or how 
characteristic you think these behaviors are of you. 



Almost 


Seldom or 


Regularly but 


Fairly 


Very 


Almost 


Never 


Once in a while 


Not often. 


Often 


Frequently 


Always 




© 


© 


© 


© 


© 


Not at All 


Slightly 


Moderately 


Characteristic 


Very 


Extremely 



Characteristic Characteristic Characteristic 



1 A. Shows a sincere interest in suggestions from members of the work group. CD©®©©© 



APPENDIX B 

Forced distribution instructions to respondents on the Employee Profile. 

Read each item and determine how characteristic of you the behavior associated with the 
item is using the “1” to “10” scale. In order to ensure that your most and least characteristic traits 
are identified, rate at least twenty (20) items “1” or “2”, and no more than twenty (20) items at “9 
or 10.” You may find that putting 5 items in the “1” or “2” and 5 in the “9” or “10” ranges on EACH 
page is easiest way to accomplish this. Rate the remaining items within the range “3” and “8” 
depending upon your assessment of how characteristic of you each item is. 



Less Characteristic 


©@ 


®©©©@® 


@® 


More Characteristic 



1 B Shows a sincere interest in suggestions from members of the work group 



• CO© ®@®©@® @® 
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APPENDIX C 

Items on the Group Involvement Scale. 

ITEM STATEMENT 

1 Shows a sincere interest In suggestions from members of the work group. 

- Seeks out the special talents and abilities of others to contribute to the quality of the team's 
product. 

3 Encourages people to speak up even when their opinions differ from the opinions of the majority. 

4 Develops a feeling of unity and sharing among co-workers. 

5 Publicly shares credit for success with those who contributed. 

6 Actively promotes the involvement of people having a stake in the outcome of the project or task. 
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APPENDIX D 



Non-forced Distribution Instructions (NFD) 



Category probabilities. 
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PERSON [MINUS] ITEM MEASURE 



Summary of measured categories. 



Category 


Number of 
Observations 


Step Calibration 


Step Std. 
Error 


Thurstone 

Threshold 
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7 
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56 


-2.91 


.39 


-3.01 
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-.71 


.15 


-1.20 


4 


324 


-1.07* 
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-.52 
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143 
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1.69 
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2.77 


.15 


3.05 




APPENDIX E 



Forced Distribution Instructions (FD) 



Category probabilities 
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Summary of measured categories. 

^ . Number of _ ... .. 

Category observations Step Callbratlon 


Step Std. 
Error 
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